Entity Taxonomy Validation and Refinement

Executive Summary: Our review finds that the current 9‑type taxonomy (identity, automation, connection, credential, owner, role, permission, resource, execution_evidence) largely covers core concepts, but needs refinements. For example, “automation” is better split into application vs execution/job to match industry terms. The owner entity should support human, group/team, and org subtypes (per NIST 800‑63’s inclusive identity model【1†L12-L17】). We also identify missing types (e.g. service_account, managed_identity, ephemeral_session, token_exchange, federation_trust, policy_statement, resource_hierarchy, materialized_edge, evidence_pack, connector_instance). We provide a corrected ER diagram, entity tables, real-world AI workflow examples, and scope-drift scenarios (e.g. an AI agent originally fetching public data later scraping PII). We outline schema changes (including remapping an “autonomous_identity” to the correct new type) and a migration roadmap with milestones. The analysis and recommendations are grounded in OAuth/OIDC and NIST standards (800-63/53), and vendor docs (AWS IAM, Azure AD, Kubernetes, etc).

Entity Taxonomy Assessment

Identity vs Credential: Identities (users, apps, machines) and credentials (secrets, tokens, certs) must be strictly separated. For example, OAuth 2.0 clearly distinguishes an application’s identity (client ID) from its credentials (client secret or token)【6†L12-L19】. We find one anomaly: connectors treat a GitHub PAT as an IdentitySubtype, but PATs are credentials. We recommend removing “pat” from identity subtypes and ensuring all authentication artifacts (API keys, certificates, tokens) map to the credential entity.
Automation vs Application/Execution: The term “automation” is ambiguous. Industry uses “application/service” or “job” for automated logic. We suggest splitting it into Application (defines what runs – e.g. a script or process) and Execution (Job) (the event of running it, which might carry context). In other words, separate the definition of automation (Flow, Business Rule) from its runtime invocation (execution_trace). This aligns with OAuth’s separation of apps vs tokens【6†L12-L19】.
Owner as Group/Org: Current owner covers a human user. It should extend to groups/teams/business units. NIST 800-63 recognizes that identity can be an organization or device【1†L12-L17】. Thus owners may be entire teams or OU’s. For example, a ServiceNow sys_user or an Azure AD group could be owner of flows. Include owner_type with values {User,Group,OrgUnit,Team}.
Additional Entity Types: The existing 9 types omit important concepts. We propose:
- ServiceAccount/ManagedIdentity: Subtypes of identity for cloud principals with special lifecycle. E.g. AWS IAM user vs AWS service-linked role, Azure Managed Identity (no credential managed), GCP Service Account (unique email). Purpose: represent non-human principals with built-in integration.
- EphemeralSession: Represents a short-lived session token (e.g. AWS STS session, Kubernetes ServiceAccount token, GitHub OIDC session). Purpose: model delegation chains.
- TokenExchange: A special credential subtype capturing multi-hop auth (per RFC 8693【6†L12-L19】). Attributes: original_token, exchange_target, scopes.
- FederationTrust: Captures an OIDC/SAML trust config (e.g. GitHub’s OIDC provider config in AWS). Purpose: treat a configured trust as an authenticator (Credential subtype with issuer, audience, thumbprints)【10†L6-L11】.
- PolicyStatement: Normalizes a policy rule (source of a permission). E.g. an AWS inline policy or Azure RBAC role definition. Attributes: effect, actions, resource_pattern, condition (ABAC)【10†L6-L11】.
- ResourceHierarchy: Represents hierarchical structure (e.g. a cloud project or Snowflake account). Links resources for multi-tenancy/compliance.
- MaterializedEdge: Optionally store each computed reachability path (flattened graph edge) for performance.
- EvidencePack: Packages of immutable findings (sealed with hash/signature) archived for audit.
- ConnectorInstance: Metadata about each connector sync (its last run, rate-limit status, tenant_id). Useful for multi-tenancy.

Each entity should include a trust_boundary (e.g. tenant or trust_domain) and source_scope (source_system+region/account) to separate multi-tenant data【10†L6-L11】【1†L12-L17】.

Entity-Relationship Diagram and Schema Table

Entity Type	Purpose & Examples	Key Attributes (min schema)	Lifecycle Events	Relationships	Audit/Evidence
Identity	Principals (user/app/service)	`{ id, type(human/bot/service), source_scope, lifeCycleStatus, attributes }`	created, credential rotation, disabled/deleted	`HAS_ROLE→Role`, `OWNED_BY→Owner`, `ACTS_AS→(execution/evidence)`	Logins, actions by identity
Application	Defines automated logic	`{ id, name, owner, source_scope, config, triggers }`	created, modified, deprecated	`RUNS_EXEC→Execution`, `USES_CONN→Connection`	Deployment, config-change events
Execution	Running instance (job/event)	`{ id, app_id, start_time, end_time, status, actor_identity }`	started, completed, failed	`USES_CRED→Credential`, `APPLIED_PERM→Permission`, `RELATED_EVIDENCE→EvidencePack`	Execution logs, audit trail
Connection	External integration config	`{ id, endpoint, type, auth_method, tenant_id }`	created, updated, retired	`USES_CRED→Credential`, `INVOKED_BY→Application`	API call logs, error logs
Credential	Auth material (token/key/etc)	`{ id, kind, issuer, subject, expiry, scopes, secret_ref }`	issued/rotated, revoked	`AUTH_AS→Identity`, `AUTH_FOR→Application`, `AUTH_AS_EXEC→Execution`	Issue/revoke logs
Owner	Accountability (person/team)	`{ id, owner_type(User/Group/Team/Org), contact, dept }`	added/removed, role change	`OWNS→Identity/Application/ConnectorInstance`	Ownership change logs
Role	Permission set	`{ id, name, description, source_scope }`	created, updated, retired	`GRANTS→Permission`, `ASSIGNED_TO→Identity/Owner`	Role assignment/removal audit
Permission	Fine-grained action (ABAC)	`{ id, action, resource_pattern, effect, condition }`	added, removed from role	`APPLIES_TO→Resource`, `GRANTED_BY→Role`	Policy change logs
Resource	Data/object target	`{ id, type, path, sensitivity, source_scope }`	created, moved, archived	`PROTECTED_BY→Permission`, `TOUCHED_BY→Execution`	Data access logs
ExecutionEvidence	Proof artifacts/logs	`{ id, execution_id, timestamp, data_hash }`	logged	`EVIDENCES→Execution`	Immutable store of logs
ConnectorInstance	Sync metadata	`{ id, type, tenant_id, last_sync, status }`	sync_started, sync_completed	(links to Application/Owner)	Sync logs, rate-limit records
FederationTrust	Trust config (OIDC/SAML)	`{ id, issuer, jwks_uri, audiences, thumbprint }`	created, rotated, revoked	`TRUSTED_BY→Credential/Connection`	Config change logs
PolicyStatement	IAM policy fragment	`{ id, source_system, effect, principals, actions, resources, conditions }`	created, updated, deleted	`GENERATED_FROM→ConnectorInstance`, `EXPANDS→Permission`	Policy change audit
EvidencePack	Immutable finding bundle	`{ id, creation_time, hash, signer }`	created, archived	`CONTAINS→ExecutionEvidence+Permission+Resource`	WORM storage, digital signatures
ResourceHierarchy	Org/tenant structure	`{ id, level, parent_id, region }`	added, reorganized	`CONTAINS→Resource`	Governance change logs

Key: source_scope ≈ (cloudAccount/tenant/cluster ID), trust_boundary mirrors it. The above schema fields capture identity type, credentialing, lifecycle state, policy conditions, multi-tenancy, and audit needs in line with standards (OAuth, SCIM/SAML) and NIST/CIS controls.

Real-World Autonomous Examples

Scenario	Entity Graph (Key Entities & Relations)	Notes
GitHub Action + AWS – A CI workflow uses GitHub OIDC to assume an AWS role. Actors: GH App (identity), OIDC token (credential), AWS Role (identity), CI Job (execution). Relationships: GH App `HAS_CRED` GH OIDC token; GitHub Action `USES_CRED` to get AWS STS `ASSUMES_ROLE` into AWS Role; Execution `APPLIED_PERM` on AWS resources.	Identity: GitHub App, AWS Role Credential: OIDC token, AWS session token Permission: AWS IAM policy Resource: S3 bucket Owner: DevOps Team	Example of federated auth (RFC8693【6†L12-L19】).
Azure Flow + OpenAI – A Logic App triggers when new Entra group member added, calls LLM for summarization. Entities: Flow definition (application), Entra SP (identity), OIDC token (cred), LLM endpoint (resource). Relationships: Flow `RUNS_EXEC`; Flow `USES_CONN` to OpenAI endpoint; Flow’s SP `USES_CRED` to acquire token; Execution evidence shows API call.	Identity: Azure Service Principal Credential: Entra managed identity token Permission: OpenAI API scope Resource: Custom data table, OpenAI endpoint Owner: IT Automation Team	Demonstrates AI-assisted automation.
K8s CronJob + Vault – A nightly job retrieves secrets from Vault then writes to DB. Entities: Kubernetes ServiceAccount (identity), TLS cert (cred), CronJob (execution), DB table (resource). Relationships: SA `HAS_CRED` TLS cert; CronJob `USES_CRED` to authenticate; Execution evidence: K8s audit logs, DB logs.	Identity: K8s ServiceAccount Credential: TLS client cert Permission: DB write privilege Resource: Database table Owner: DevOps	Illustrates infrastructure automation.
AI Data Pipeline – An ML pipeline first processes public data, later updated to fetch customer PII for personalization. Entities: Pipeline app, API tokens (credential), data lake (resource), LLM model (resource). Relationships: Pipeline `USES_CRED` for data source; Execution initially `TOUCHED` public DB, later `TOUCHED` customer PII table using same app identity.	Identity: Pipeline service Credential: API key for data lake Permission: Data lake SELECT permission Resource: Public vs PII dataset Owner: Data Science Team	Scope drift: logic didn’t change identity or token, but data sensitivity rose.

(More examples: Terraform automation, Databricks jobs accessing Unity Catalog, ServiceNow business rule invoking external ML API, AD group rule provisioning accounts, etc.)

Scope-Drift Scenarios

Data Sensitivity Creep: An AI agent initially trained on public records now ingests PII without code change.
- Detection: Audit logs show the same automation→resource link changed from non-PII table to PII table. A sensitivity label mismatch triggers a finding (CIS Control 03 on data classification).
- Remediation: Quarantine the automation, review its data access policy, and rotate its credentials if needed. Enforce stricter RBAC so LLM calls only access explicitly allowed columns (NIST AC-6【10†L6-L11】).
Unauthorized Scope Expansion: A serverless job originally granted “read_sales_data” permission later also runs a “export_customer_list” procedure.
- Detection: Compare permissions used in each execution over time. The Permission entity now includes a new action (export_customer_list) not present in initial deploy.
- Remediation: Flag deviation, require dev-team approval, and remediate the IAM role. Use a “deny-by-default” guardrail or OPA policy (NIST SP 800-162 ABAC principles).
Hidden Delegation: A microservice began using a new downstream API (triggered by an AI recommendation) it wasn’t originally authorized for.
- Detection: Materialized edges show an unexpected Execution→Resource link. Verify if the connecting Credential and trust chain (token_exchange) were legitimate.
- Remediation: Invalidate the exchanged token, tighten trust relations (e.g. remove JWT audience), and require new provisioning for the service principal.

Schema Changes & Migration

Rename & Subtype: Rename automation → Application/Definition. Introduce subtype Execution for runs. Example: treat ServiceNow Flow definition as Application, its run instance as Execution.
Expand owner: Add owner_type (User/Group/Team/Org). Map human_identity to User, and model Azure AD groups or ServiceNow department as owners too.
Remap autonomous_identity: Existing data where NormalizedNodeType=autonomous_identity should be split by subtype. In migration, examine each row’s subtype: if it’s a script, map to Application; if it’s a robot account, map to Identity with type=service_account.
Introduce new types: Update schema to include the types above (service_account, token_exchange, etc). For example, model an AWS STS session as an EphemeralSession with trust_boundary = AWS account ID.
Tests: Write integration tests using sample metadata: e.g. ingest a GitHub OIDC workflow, verify it creates an Identity (GitHub App) and a Credential (OIDC token) linked via token_exchange. Ensure an LLM call from a dummy automation yields correct Resource labeling and evidence.

Migration Plan:

Schema Extension: Add new tables/collections for the new entities (FederationTrust, PolicyStatement, etc).
Data Migration: In a maintenance window, run a script to transform autonomous_identity records and fill owner_type. Validate by comparing ID counts pre/post.
Connector Updates: Adjust connectors to emit owner groups. For example, Azure AD connector should emit both user and group owners.
Testing: Use synthetic scenarios (see above) to verify scope drift detection. Check queries like “find executions accessing PII resources after initial timestamp”.

Recommendations

Security: Enforce least privilege by design. Model and audit policy conditions (AWS IAM conditions, ABAC) explicitly【10†L6-L11】. Use signed EvidencePacks to prevent tampering (aligns with audit requirements【14†L】).
Operational: Enhance connector reliability: handle rate limits and partial failures by marking incomplete syncs (evidence_completeness flags). Regularly rotate credentials and review federation trusts.
Product: Improve UI to clarify entity taxonomy. Label “automation” as “Application/Flow” and expose owner hierarchies. Provide automated scope-drift alerts (e.g. if a run’s target resource sensitivity exceeds baseline).
Governance: Ensure cross-domain testing: CI pipelines should include tests where a bot’s data access is varied to simulate scope drift. Maintain documentation linking entity types to standards (e.g. OAuth2, SCIM schemas).

This analysis assumes no further constraints beyond enterprise best practices and zero-trust principles【1†L12-L17】【6†L12-L19】. All major cloud and SaaS scenarios have been considered, and the revised model is aligned with standards (OAuth2/OIDC, SAML, SCIM, NIST 800-63/53, CIS, CSA).

Next Action

Status: adopted — shipped External validation confirmed entity model and authority path approach. Findings incorporated into data model hardening in 01-data-model.md. No further action required.

Entity Taxonomy Assessment​

Entity-Relationship Diagram and Schema Table​

Real-World Autonomous Examples​

Scope-Drift Scenarios​

Schema Changes & Migration​

Recommendations​

Next Action​